43 research outputs found

    The Brain on Low Power Architectures - Efficient Simulation of Cortical Slow Waves and Asynchronous States

    Full text link
    Efficient brain simulation is a scientific grand challenge, a parallel/distributed coding challenge and a source of requirements and suggestions for future computing architectures. Indeed, the human brain includes about 10^15 synapses and 10^11 neurons activated at a mean rate of several Hz. Full brain simulation poses Exascale challenges even if simulated at the highest abstraction level. The WaveScalES experiment in the Human Brain Project (HBP) has the goal of matching experimental measures and simulations of slow waves during deep-sleep and anesthesia and the transition to other brain states. The focus is the development of dedicated large-scale parallel/distributed simulation technologies. The ExaNeSt project designs an ARM-based, low-power HPC architecture scalable to million of cores, developing a dedicated scalable interconnect system, and SWA/AW simulations are included among the driving benchmarks. At the joint between both projects is the INFN proprietary Distributed and Plastic Spiking Neural Networks (DPSNN) simulation engine. DPSNN can be configured to stress either the networking or the computation features available on the execution platforms. The simulation stresses the networking component when the neural net - composed by a relatively low number of neurons, each one projecting thousands of synapses - is distributed over a large number of hardware cores. When growing the number of neurons per core, the computation starts to be the dominating component for short range connections. This paper reports about preliminary performance results obtained on an ARM-based HPC prototype developed in the framework of the ExaNeSt project. Furthermore, a comparison is given of instantaneous power, total energy consumption, execution time and energetic cost per synaptic event of SWA/AW DPSNN simulations when executed on either ARM- or Intel-based server platforms

    Gaussian and exponential lateral connectivity on distributed spiking neural network simulation

    Get PDF
    We measured the impact of long-range exponentially decaying intra-areal lateral connectivity on the scaling and memory occupation of a distributed spiking neural network simulator compared to that of short-range Gaussian decays. While previous studies adopted short-range connectivity, recent experimental neurosciences studies are pointing out the role of longer-range intra-areal connectivity with implications on neural simulation platforms. Two-dimensional grids of cortical columns composed by up to 11 M point-like spiking neurons with spike frequency adaption were connected by up to 30 G synapses using short- and long-range connectivity models. The MPI processes composing the distributed simulator were run on up to 1024 hardware cores, hosted on a 64 nodes server platform. The hardware platform was a cluster of IBM NX360 M5 16-core compute nodes, each one containing two Intel Xeon Haswell 8-core E5-2630 v3 processors, with a clock of 2.40 G Hz, interconnected through an InfiniBand network, equipped with 4x QDR switches.Comment: 9 pages, 9 figures, added reference to final peer reviewed version on conference paper and DO

    Real-time cortical simulations: energy and interconnect scaling on distributed systems

    Full text link
    We profile the impact of computation and inter-processor communication on the energy consumption and on the scaling of cortical simulations approaching the real-time regime on distributed computing platforms. Also, the speed and energy consumption of processor architectures typical of standard HPC and embedded platforms are compared. We demonstrate the importance of the design of low-latency interconnect for speed and energy consumption. The cost of cortical simulations is quantified using the Joule per synaptic event metric on both architectures. Reaching efficient real-time on large scale cortical simulations is of increasing relevance for both future bio-inspired artificial intelligence applications and for understanding the cognitive functions of the brain, a scientific quest that will require to embed large scale simulations into highly complex virtual or real worlds. This work stands at the crossroads between the WaveScalES experiment in the Human Brain Project (HBP), which includes the objective of large scale thalamo-cortical simulations of brain states and their transitions, and the ExaNeSt and EuroExa projects, that investigate the design of an ARM-based, low-power High Performance Computing (HPC) architecture with a dedicated interconnect scalable to million of cores; simulation of deep sleep Slow Wave Activity (SWA) and Asynchronous aWake (AW) regimes expressed by thalamo-cortical models are among their benchmarks.Comment: 8 pages, 8 figures, 4 tables, submitted after final publication on PDP2019 proceedings, corrected final DOI. arXiv admin note: text overlap with arXiv:1812.04974, arXiv:1804.0344

    TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale

    Get PDF
    International audienceTo achieve high performance and high energy efficiency on near-future exascale computing systems, three key technology gaps needs to be bridged. These gaps include: energy efficiency and thermal control; extreme computation efficiency via HW acceleration and new arithmetics; methods andtools for seamless integration of reconfigurable accelerators in heterogeneous HPC multi-node platforms. TEXTAROSSA aims at tackling this gap through a co-design approach to heterogeneous HPC solutions, supported by the integration and extension of HW and SW IPs, programming models and tools derived from European research

    The Latest Version of SiFAP: Beyond Microsecond Time Scale Photometry of Variable Objects

    No full text
    Technical improvements of the Silicon Fast optical Astronomical Photometer (SiFAP) allow the instrumentation to integrate photons coming from the target in time windows down to 20ÎĽs. Further hardware improvement has been implemented to tag the Time of Arrival (ToA) of each single photon. In addition, a new commercial GPS unit has replaced the older commercial unit improving time resolution. The latest version of SiFAP has been calibrated to check photometric sensitivity and linearity through observations of several standard stars. SiFAP has been also successfully tested by observing the HZ/Her X-1 Binary System estimating the spin period of the pulsar (Her X-1). Our results have been then compared to data available in literature

    EuroEXA Custom Switch: an innovative FPGA-based system for extreme scale computing in Europe

    Get PDF
    EuroEXA is a major European FET research initiative that aims to deliver a proof-of-concept of a next generation Exa-scalable HPC platform. EuroEXA leverages on previous projects results (ExaNeSt, ExaNoDe and ECOSCALE) to design a medium scale but scalable, fully working HPC system prototype exploiting state-of-the-art FPGA devices that integrate compute accelerators and low-latency high-throughputnetwork. Exascale-class systems are expected to host a very large number of computing nodes, from 104 up to 105, so that capability and performances of the interconnect architecture are critical to achieve high computing efficiency at this scale. In this perspective, EuroEXA enhances the ExaNet architecture, inherited by the ExaNeSt project, and introduces a multi-tier, hybrid topology network built on top of an FPGA-integrated Custom Switch that provides high throughput and low inter-node traffic latency for the different layers of the network hierarchy. Deployment of a few testbeds is planned, with incremental complexity and equipped with complete software stack and runtime environment, to support the integration and test of the network design and to allow for evaluation of system performance and scalability through benchmarks based on real HPC applications. Design and integration activities are ongoing and the first small scale prototype (50 nodes) is expected to be completed in fall 2020 followed, one year later, by the deployment of the larger prototype (250/500 nodes)

    EuroEXA Custom Switch: an innovative FPGA-based system for extreme scale computing in Europe

    No full text
    EuroEXA is a major European FET research initiative that aims to deliver a proof-of-concept of a next generation Exa-scalable HPC platform. EuroEXA leverages on previous projects results (ExaNeSt, ExaNoDe and ECOSCALE) to design a medium scale but scalable, fully working HPC system prototype exploiting state-of-the-art FPGA devices that integrate compute accelerators and low-latency high-throughputnetwork. Exascale-class systems are expected to host a very large number of computing nodes, from 104 up to 105, so that capability and performances of the interconnect architecture are critical to achieve high computing efficiency at this scale. In this perspective, EuroEXA enhances the ExaNet architecture, inherited by the ExaNeSt project, and introduces a multi-tier, hybrid topology network built on top of an FPGA-integrated Custom Switch that provides high throughput and low inter-node traffic latency for the different layers of the network hierarchy. Deployment of a few testbeds is planned, with incremental complexity and equipped with complete software stack and runtime environment, to support the integration and test of the network design and to allow for evaluation of system performance and scalability through benchmarks based on real HPC applications. Design and integration activities are ongoing and the first small scale prototype (50 nodes) is expected to be completed in fall 2020 followed, one year later, by the deployment of the larger prototype (250/500 nodes)

    L0TP+: the Upgrade of the NA62 Level-0 Trigger Processor

    Get PDF
    The L0TP+ initiative is aimed at the upgrade of the FPGA-based Level-0 Trigger Processor (L0TP) of the NA62 experiment at CERN for the post-LS2 data taking, which is expected to happen at 100% of design beam intensity, corresponding to about 3.3 Ă— 1012 protons per pulse on the beryllium target used to produce the kaons beam. Although tests performed at the end of 2018 showed a substantial robustness of the L0TP system also at full beam intensity, there are several reasons to motivate such an upgrade: i) avoid FPGA platform obsolescence, ii) make room for improvements in the firmware design leveraging a more capable FPGA device, iii) add new functionalities, iv) support the 4 beam intensity increase foreseen in future experiment upgrades. We singled out the Xilinx Virtex UltraScale+ VCU118 development board as the ideal platform for the project. L0TP+ seamless integration into the current NA62 TDAQ system and exact matching of L0TP functionalities represent the main requirements and focus of the project; nevertheless, the final design will include additional features, such as a PCIe RDMA engine to enable processing on CPU and GPU accelerators, and the partial reconfiguration of trigger firmware starting from a high level language description (C/C++). The latter capability is enabled by modern High Level Synthesis (HLS) tools, but to what extent this methodology can be applied to perform complex tasks in the L0 trigger, with its stringent latency requirements and the limits imposed by single FPGA resources, is currently being investigated. As a test case for this scenario we considered the online reconstruction of the RICH detector rings on an HLS generated module, using a dedicated primitives data stream with PM hits IDs. Besides, the chosen platform supports the Virtex Ultrascale+ FPGA wide I/O capabilities, allowing for straightforward integration of primitive streams from additional sub-detectors in order to improve the performance of the trigger

    L0TP+: the Upgrade of the NA62 Level-0 Trigger Processor

    No full text
    The L0TP+ initiative is aimed at the upgrade of the FPGA-based Level-0 Trigger Processor (L0TP) of the NA62 experiment at CERN for the post-LS2 data taking, which is expected to happen at 100% of design beam intensity, corresponding to about 3.3 Ă— 1012 protons per pulse on the beryllium target used to produce the kaons beam. Although tests performed at the end of 2018 showed a substantial robustness of the L0TP system also at full beam intensity, there are several reasons to motivate such an upgrade: i) avoid FPGA platform obsolescence, ii) make room for improvements in the firmware design leveraging a more capable FPGA device, iii) add new functionalities, iv) support the 4 beam intensity increase foreseen in future experiment upgrades. We singled out the Xilinx Virtex UltraScale+ VCU118 development board as the ideal platform for the project. L0TP+ seamless integration into the current NA62 TDAQ system and exact matching of L0TP functionalities represent the main requirements and focus of the project; nevertheless, the final design will include additional features, such as a PCIe RDMA engine to enable processing on CPU and GPU accelerators, and the partial reconfiguration of trigger firmware starting from a high level language description (C/C++). The latter capability is enabled by modern High Level Synthesis (HLS) tools, but to what extent this methodology can be applied to perform complex tasks in the L0 trigger, with its stringent latency requirements and the limits imposed by single FPGA resources, is currently being investigated. As a test case for this scenario we considered the online reconstruction of the RICH detector rings on an HLS generated module, using a dedicated primitives data stream with PM hits IDs. Besides, the chosen platform supports the Virtex Ultrascale+ FPGA wide I/O capabilities, allowing for straightforward integration of primitive streams from additional sub-detectors in order to improve the performance of the trigger

    The Next Generation of Exascale-Class Systems:The ExaNeSt Project

    No full text
    The ExaNeSt project started on December 2015 and is funded by EU H2020 research framework (call H2020-FETHPC-2014, n. 671553) to study the adoption of low-cost, Linux-based power-efficient 64-bit ARM processors clusters for Exascale-class systems. The ExaNeSt consortium pools partners with industrial and academic research expertise in storage, interconnects and applications that share a vision of an European Exascale-class supercomputer. Their goal is designing and implementing a physical rack prototype together with its cooling system, the storage non-volatile memory (NVM) architecture and a low-latency interconnect able to test different options for interconnection and storage. Furthermore, the consortium is to provide real HPC applications to validate the system. Herein we provide a status report of the project initial developments.To appear in: the Proceedings of the Euromicro Conference on Digital System Design (DSD 2017), Vienna, Austria, 30 August - 1 September, 201
    corecore